Acoustic Event Detection in Vehicles: A Multi-Label Classification Approach

2節でモデルの解説をしている

大規模事前学習済みモデルの利用を前提にしている

Therefore, deep transfer learning approaches using pre-trained models that are trained in large audio data can be used for downstream classification tasks

ラベル分類タスクが下流タスク?

ラベル間の関係性を読み取る必要があると思うのだが...

2.3のAudio Data Augmentation Systemでマルチラベル用のデータ生成の話をしている

音イベントの重なり具合を調節できるとか書いてある

後で見て,マルチラベル生成を見直した方が良いかも

分解能を決めるのも大事らしい

この論文では1秒ごとに推測している

イベントの継続時間を参考に決めるらしい

2.6が本題

The BEATs model uses a task-specific linear classifier on the encoder layer to generate

labels for downstream classification tasks 20. Following a similar approach, a linear

classifier is considered for the classifier model of the AED system. The BEATs model shows

good results for downstream tasks such as audio classification and speech classification 20.

A linear classifier can be a neural network or algorithms such as SVM.

結局モデルはどうするのさ? > 2.6.1

線形分類器らしい

単層のNN

ニューラルネットワークとは何だったのか

層一つしかないってどういうこと?

入力 > 中間層(1層) > 出力ということ?

重複イベントの話は3.6

損失関数はBCE lossかFocal loss

先輩のを使った方が良いかも > 音を用いた行動認識モデルの知識蒸留による軽量化および精度向上